Goto

Collaborating Authors

 Newfoundland and Labrador




TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models

arXiv.org Artificial Intelligence

Digital versions of real-world text documents often suffer from issues like environmental corrosion of the original document, low-quality scanning, or human interference. Existing document restoration and inpainting methods typically struggle with generalizing to unseen document styles and handling high-resolution images. To address these challenges, we introduce TextDoctor, a novel unified document image inpainting method. Inspired by human reading behavior, TextDoctor restores fundamental text elements from patches and then applies diffusion models to entire document images instead of training models on specific document types. To handle varying text sizes and avoid out-of-memory issues, common in high-resolution documents, we propose using structure pyramid prediction and patch pyramid diffusion models. These techniques leverage multiscale inputs and pyramid patches to enhance the quality of inpainting both globally and locally. Extensive qualitative and quantitative experiments on seven public datasets validated that TextDoctor outperforms state-of-the-art methods in restoring various types of high-resolution document images.


Lightweight Authenticated Task Offloading in 6G-Cloud Vehicular Twin Networks

arXiv.org Artificial Intelligence

Task offloading management in 6G vehicular networks is crucial for maintaining network efficiency, particularly as vehicles generate substantial data. Integrating secure communication through authentication introduces additional computational and communication overhead, significantly impacting offloading efficiency and latency. This paper presents a unified framework incorporating lightweight Identity-Based Cryptographic (IBC) authentication into task offloading within cloud-based 6G Vehicular Twin Networks (VTNs). Utilizing Proximal Policy Optimization (PPO) in Deep Reinforcement Learning (DRL), our approach optimizes authenticated offloading decisions to minimize latency and enhance resource allocation. Performance evaluation under varying network sizes, task sizes, and data rates reveals that IBC authentication can reduce offloading efficiency by up to 50% due to the added overhead. Besides, increasing network size and task size can further reduce offloading efficiency by up to 91.7%. As a countermeasure, increasing the transmission data rate can improve the offloading performance by as much as 63%, even in the presence of authentication overhead. The code for the simulations and experiments detailed in this paper is available on GitHub for further reference and reproducibility [1].


paper-oras-neurips

Neural Information Processing Systems

Domain decomposition methods are widely used and effective in the approximation of solutions to partial differential equations. Yet the optimal construction of these methods requires tedious analysis and is often available only in simplified, structured-grid settings, limiting their use for more complex problems. In this work, we generalize optimized Schwarz domain decomposition methods to unstructured-grid problems, using Graph Convolutional Neural Networks (GCNNs) and unsupervised learning to learn optimal modifications at subdomain interfaces. A key ingredient in our approach is an improved loss function, enabling effective training on relatively small problems, but robust performance on arbitrarily large problems, with computational cost linear in problem size. The performance of the learned linear solvers is compared with both classical and optimized domain decomposition algorithms, for both structured-and unstructured-grid problems.


Mapping Galaxy Images Across Ultraviolet, Visible and Infrared Bands Using Generative Deep Learning

arXiv.org Artificial Intelligence

ABSTRACT We demonstrate that generative deep learning can translate galaxy observations across ultraviolet, visible, and infrared photometric bands. Leveraging mock observations from the Illustris simulations, we develop and validate a supervised image-to-image model capable of performing both band interpolation and extrapolation. The resulting trained models exhibit high fidelity in generating outputs, as verified by both general image comparison metrics (MAE, SSIM, PSNR) and specialized astronomical metrics (GINI coefficient, M20). Moreover, we show that our model can be used to predict real-world observations, using data from the DECaLS survey as a case study. These findings highlight the potential of generative learning to augment astronomical datasets, enabling efficient exploration of multi-band information in regions where observations are incomplete. This work opens new pathways for optimizing mission planning, guiding high-resolution follow-ups, and enhancing our understanding of galaxy morphology and evolution. INTRODUCTION Galaxies can appear quite different when observed across various photometric bands due to the wavelength-dependent nature of the light they emit (Kouroumpatzakis et al. 2023). In shorter wavelength bands, like the ultraviolet (U) band, galaxies reveal the presence of hot, young stars and regions of active star formation. Meanwhile, in optical bands (e.g., G, R), the light primarily comes from older, cooler stars. At longer wavelengths, such as in infrared bands, the emission is dominated by dust and the cooler, more evolved stellar populations. Studying galaxies across multiple photometric bands is essential because each band reveals different aspects of a galaxy's physical properties, offering a more complete picture of its structure, composition, and evolution. By combining observations from various bands, it is possible to trace a galaxy's star formation history, stellar populations, dust content, and even interactions with its environment. Observations made in multiple wavelengths require fewer inferences and lead to stronger conclusions being drawn about the observed galaxies.


Full Field Digital Mammography Dataset from a Population Screening Program

arXiv.org Artificial Intelligence

Breast cancer presents the second largest cancer risk in the world to women. Early detection of cancer has been shown to be effective in reducing mortality. Population screening programs schedule regular mammography imaging for participants, promoting early detection. Currently, such screening programs require manual reading. False-positive errors in the reading process unnecessarily leads to costly follow-up and patient anxiety. Automated methods promise to provide more efficient, consistent and effective reading. To facilitate their development, a number of datasets have been created. With the aim of specifically targeting population screening programs, we introduce NL-Breast-Screening, a dataset from a Canadian provincial screening program. The dataset consists of 5997 mammography exams, each of which has four standard views and is biopsy-confirmed. Cases where radiologist reading was a false-positive are identified. NL-Breast is made publicly available as a new resource to promote advances in automation for population screening programs.


BenthicNet: A global compilation of seafloor images for deep learning applications

arXiv.org Artificial Intelligence

Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614.


Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

arXiv.org Artificial Intelligence

With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks, outperforming vanilla ICL with a 4.87% reduction in Attack Success Rate (ASR); however, they exhibit overconfidence in the demonstrations, leading to a 2% increase in ASR for demonstration attacks. Adversarial training can help improve the robustness of ICL methods to adversarial attacks; however, such a training scheme can be too costly in the context of LLMs. As an alternative, we introduce an effective training-free adversarial defence method, DARD, which enriches the example pool with those attacked samples. We show that DARD yields improvements in performance and robustness, achieving a 15% reduction in ASR over the baselines. Code and data are released to encourage further research: https://github.com/simonucl/adv-retreival-icl


Text2TimeSeries: Enhancing Financial Forecasting through Time Series Prediction Updates with Event-Driven Insights from Large Language Models

arXiv.org Artificial Intelligence

Time series models, typically trained on numerical data, are designed to forecast future values. These models often rely on weighted averaging techniques over time intervals. However, real-world time series data is seldom isolated and is frequently influenced by non-numeric factors. For instance, stock price fluctuations are impacted by daily random events in the broader world, with each event exerting a unique influence on price signals. Previously, forecasts in financial markets have been approached in two main ways: either as time-series problems over price sequence or sentiment analysis tasks. The sentiment analysis tasks aim to determine whether news events will have a positive or negative impact on stock prices, often categorizing them into discrete labels. Recognizing the need for a more comprehensive approach to accurately model time series prediction, we propose a collaborative modeling framework that incorporates textual information about relevant events for predictions. Specifically, we leverage the intuition of large language models about future changes to update real number time series predictions. We evaluated the effectiveness of our approach on financial market data.